---
title: Episodes
description: Learn how to use episodes to manage sequences of inferences that share a common outcome.
---

An episode is a sequence of inferences associated with a common downstream outcome.

For example, an episode could refer to a sequence of LLM calls associated with:

- Resolving a support ticket
- Preparing an insurance claim
- Completing a phone call
- Extracting data from a document
- Drafting an email

An episode will include one or more functions, and sometimes multiple calls to the same function.
Your application can run arbitrary actions (e.g. interact with users, retrieve documents, actuate robotics) between function calls within an episode.
Though these are outside the scope of TensorZero, it is fine (and encouraged) to build your LLM systems this way.

The `/inference` endpoint accepts an optional `episode_id` field.
When you make the first inference request, you don't have to provide an `episode_id`.
The gateway will create a new episode for you and return the `episode_id` in the response.
When you make the second inference request, you must provide the `episode_id` you received in the first response.
The gateway will use the `episode_id` to associate the two inference requests together.

<Tip>

You shouldn't generate episode IDs yourself.
The gateway will create a new episode ID for you if you don't provide one.
Then, you can use it with other inferences you'd like to associate with the episode.

</Tip>

<Tip>

You can also find the runnable code for this example on [GitHub](https://github.com/tensorzero/tensorzero/tree/main/examples/guides/episodes).

</Tip>

## Scenario

In the [Quickstart](/quickstart/), we built a simple LLM application that writes haikus about artificial intelligence.

Imagine we want to separately generate some commentary about the haiku, and present both pieces of content to users.
We can associate both inferences with the same episode.

Let's define an additional function in our configuration file.

```toml title="tensorzero.toml"
[functions.analyze_haiku]
type = "chat"

[functions.analyze_haiku.variants.gpt_4o_mini]
type = "chat_completion"
model = "gpt_4o_mini"
```

<Accordion title="Full Configuration">

```toml title="tensorzero.toml"
[models.gpt_4o_mini]
routing = ["openai"]

[models.gpt_4o_mini.providers.openai]
type = "openai"
model_name = "gpt-4o-mini"

[functions.generate_haiku]
type = "chat"

[functions.generate_haiku.variants.gpt_4o_mini]
type = "chat_completion"
model = "gpt_4o_mini"

[functions.analyze_haiku]
type = "chat"

[functions.analyze_haiku.variants.gpt_4o_mini]
type = "chat_completion"
model = "gpt_4o_mini"
```

</Accordion>

## Inferences & Episodes

This time, we'll create a multi-step workflow that first generates a haiku and then analyzes it.
We won't provide an `episode_id` in the first inference request, so the gateway will generate a new one for us.
We'll then use that value in our second inference request.

```python title="run_with_tensorzero.py"
from tensorzero import TensorZeroGateway

with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
    haiku_response = client.inference(
        function_name="generate_haiku",
        # We don't provide an episode_id for the first inference in the episode
        input={
            "messages": [
                {
                    "role": "user",
                    "content": "Write a haiku about artificial intelligence.",
                }
            ]
        },
    )

    print(haiku_response)

    # When we don't provide an episode_id, the gateway will generate a new one for us
    episode_id = haiku_response.episode_id

    # In a production application, we'd first validate the response to ensure the model returned the correct fields
    haiku = haiku_response.content[0].text

    analysis_response = client.inference(
        function_name="analyze_haiku",
        # For future inferences in that episode, we provide the episode_id that we received
        episode_id=episode_id,
        input={
            "messages": [
                {
                    "role": "user",
                    "content": f"Write a one-paragraph analysis of the following haiku:\n\n{haiku}",
                }
            ]
        },
    )

    print(analysis_response)
```

<Accordion title="Sample Output">

```python "01921116-0cd9-7d10-a9a6-d5c8b9ba602a"
ChatInferenceResponse(
    inference_id=UUID('01921116-0fff-7272-8245-16598966335e'),
    episode_id=UUID('01921116-0cd9-7d10-a9a6-d5c8b9ba602a'),
    variant_name='gpt_4o_mini',
    content=[
        Text(
            type='text',
            text='Silent circuits pulse,\nWhispers of thought in code bloom,\nMachines dream of us.',
        ),
    ],
    usage=Usage(
        input_tokens=15,
        output_tokens=20,
    ),
)

ChatInferenceResponse(
    inference_id=UUID('01921116-1862-7ea1-8d69-131984a4625f'),
    episode_id=UUID('01921116-0cd9-7d10-a9a6-d5c8b9ba602a'),
    variant_name='gpt_4o_mini',
    content=[
        Text(
            type='text',
            text='This haiku captures the intricate and intimate relationship between technology and human consciousness. '
                 'The phrase "Silent circuits pulse" evokes a sense of quiet activity within machines, suggesting that '
                 'even in their stillness, they possess an underlying vibrancy. The imagery of "Whispers of thought in '
                 'code bloom" personifies the digital realm, portraying lines of code as organic ideas that grow and '
                 'evolve, hinting at the potential for artificial intelligence to derive meaning or understanding from '
                 'human input. Finally, "Machines dream of us" introduces a poignant juxtaposition between human '
                 'creativity and machine logic, inviting contemplation about the nature of thought and consciousness '
                 'in both realms. Overall, the haiku encapsulates a profound reflection on the emergent sentience of '
                 'technology and the deeply interwoven future of humanity and machines.',
        ),
    ],
    usage=Usage(
        input_tokens=39,
        output_tokens=155,
    ),
)
```

</Accordion>

## Extras

### Supply your own episode ID

The gateway automatically generates episode IDs when you don't provide one.
If you must supply your own, generate a UUIDv7 and use it as the episode ID.

<Warning>

In Python, use `from tensorzero.util import uuid7` instead of `pip install uuid7`.
The external `uuid7` library is broken and will cause `"Invalid Episode ID: Timestamp is in the future"` errors.

</Warning>

## Conclusion & Next Steps

Episodes are first-class citizens in TensorZero that enable powerful workflows for multi-step LLM systems.
You can use them alongside other features like [experimentation](/experimentation/run-adaptive-ab-tests), [metrics & feedback](/gateway/guides/metrics-feedback/), and [tool use (function calling)](/gateway/guides/tool-use).
For example, you can track KPIs for entire episodes instead of individual inferences, and later jointly optimize your LLMs to maximize these metrics.
